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Abstract 

A quantile summary is a data structure that approximates to e-relative error the order statistics of a 
much larger underlying dataset. 

In this paper we develop a randomized online quantile summary for the cash register data input 
model and comparison data domain model that uses 0(j log ^ words of memory. This improves upon 
the previous best upper bound of 0(- log^^^ -) by Agarwal et. al. (PODS 2012). Further, by a lower 
bound of Hung and Ting (FAW 2010) no deterministic summary for the comparison model can outperform 
our randomized summary in terms of space complexity. Lastly, our summary has the nice property that 
0(i log y words suffice to ensure that the success probability is 1 — 


1 Introduction 

A quantile summary S' is a fundamental data structure that summarizes an underlying dataset X of size 
n, in space much less than n. Given a query (f>, S returns a sample y oi X such that the rank of j/ in X 
is (probably) approximately (pn. Quantile summaries are used in sensor networks to aggregate data in an 
energy-efficient manner and in database query optimizers to generate query execution plans. 

Quantile summaries have been developed for a variety of different models and metrics. The data input 
model we consider is the standard online cash register streaming model, in which a new item is added to the 
dataset at each new timestep, and the total number of items is not known until the end. The data domain 
model we consider is the comparison model, in which stream items come from an arbitrary ordered domain 
(and specifically, not necessarily from the integers). 

Formally, our quantile summary problem is defined over a totally ordered domain V and by an error 
parameter e < 1/2. There is a dataset X that is initially empty. Time occurs in discrete steps. In timestep 
t, stream item Xt arrives and is then processed, and then any quantile queries (p in that step are received 
and processed. To be definite, we pick the first timestep to be 1. We write Xt or X(t) for the t-item prefix 
stream xi... Xt of X. The goal is to maintain at all times t a summary St of the dataset Xt that, given 
any query p in (0,1], can return a sample y = y{(pi) so that \R(jj,Xt) — pt\ < et, where R{a,Z) is the 
rank of item a in set Z, defined as \{z € Z : a < z}\. For randomized summaries, we only require that 
VtV^, P{\R{y,Xt) — pt\ < et) > 2/3; that is, y’s rank is only probably close to pt, not definitely close. In 
fact, it will be easier to deal with the rank directly, so we define p = pt and use that in what follows. 

1.1 Previous work 

The two most directly relevant pieces of prior work are randomized online quantile summaries for the cash 
register/comparison model. Aside from oblivious sampling algorithms (which require storing n(l/e^) sam¬ 
ples) we are unaware of any other randomized online quantile summaries that work in the comparison model. 

The newer of the two is that of Agarwal, Cormode, Huang, Phillips, Wei, and Yi m m- Among other 
results, Agarwal et. al. develop a randomized online quantile summary for the cash register/comparison 
model that uses 0(Mog^'^^ i) words of memory. This summary has the nice property that any two such 
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summaries can be combined to form a summary of the combined underlying dataset without loss of accuracy 
or increase in size. 

The earlier such summary is that of Manku, Rajagopalan, and Lindsay [6], which uses 0(-log^-) 
space. At a high level, their algorithm downsamples the input stream in a non-uniform way and feeds the 
downsampled stream into a deterministic summary, which periodically adjusts the downsampling rate. 

We note here that our algorithm is inspired by the algorithm of Manku et. al. but has important 
differences. We defer a discussion of the similarities and differences to sectionafter the presentation of our 
algorithm in section 

For the comparison model, the best deterministic online summary to date is the (GK) summary of 
Greenwald and Khanna [3], which uses O(Mogen) space. This improved upon a deterministic (MRL) 
summary of Manku, Rajagopalan, and Lindsay [5] and a summary implied by Munro and Paterson [7], 
which use 0(^ log^ en) space. 

A more restrictive domain model than the comparison model is the bounded universe model, in which 
elements are drawn from the integers {1,..., it}. For this model there is a deterministic online summary by 
Shrivastava, Buragohain, Agrawal, and Suri m that uses space. 

Not much exists in the way of lower bounds for this problem. There is a simple lower bound of r2(l/e) 
which intuitively comes from the fact that no one sample can satisfy more than 2en different rank queries. 
For the comparison model. Hung and Ting [i] developed a deterministic log j) lower bound. Whether 
this bound can be extended to hold for our weaker probabilistic guarantee, and whether our algorithm can 
be modified to satisfy the stronger deterministic guarantee, are both open questions. 

1.2 Our results 

In the next section we describe a simple 0( j log j) streaming summary that is online except that it requires 
n to be given up front and that it is unable to process queries until it has seen a constant fraction of the 
input stream. In section we develop this simple summary into a fully online summary that can answer 
queries at any point in time. We close in section]^ by examining the similarities and differences between our 
summary and previous work and discuss a design approach for similar streaming problems. 


2 A simple streaming summary 

Before we describe our algorithm we must first describe its two main components in a bit more detail than 
was used in the introduction. The two components are Bernoulli sampling and the GK summary [3j. 

2.1 Bernoulli sampling 

Bernoulli sampling downsamples a stream X of size n to a sample stream S by choosing to include each next 
item into S with independent probability mjn. (As stated this requires knowing the size of X in advance.) 
At the end of processing A, the expected size of S is m, and the expected rank of any sample y in S' is 
E{R{y, S)) = X). In fact, for any times t < n and partial streams Xt and St, where St is the sample 

stream of At, we have if(|St|) = mt/n and E{R{y, St)) = ^R(y,Xt). To generate an estimate for R{y,Xt) 
from St we use R{y,Xt) = ^R{y,St). The following theorem bounds the probability that S is very large 
or that R{y, Xt) is very far from i?(y. At) (for any given time t > njQA, but not for all times t = nj64:.. .n 
combined). The proof is folklore, a simple application of Ghernoff bounds. 

Theorem 2.1. For all times t > n/64, P(|St| > 2tm/n) < exp(— to/192). 

Further, for all times t > n/64 and items y, P{\R{'y,Xt) — R{y,Xt)\ > et/8) < 2exp(—£^ to/12288). 

Proof. For the first part, P(|St| > 2tm/n) < exp(—tTO/3n) < exp(— to/ 192) (since t > n/64). 

For the second part, P(|.R(y, At) — i?(y, At)| > et/8) is equal to P{\R{y,St) — E{R{y,St))\ > etm/8n). 
The Ghernoff bound is P(|i?(y, St) — A(i?(y, St))| > 6E{R{y,St))) < 2exp(—min{^, (5^}P(P(y, St))/3). 
Here, (5 = et/8R{y, St), so P < 2exp(—£^t^TO/192nP(P(y, St))) < 2exp(—e^TO/12288). □ 
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This means that, given any 1 < p < t, if we return the sample y G St with R{y,St) = pm/n, then 
R{y, Xt) is likely to be close to p. 

2.2 GK summary 

The GK summary is a deterministic summary that can answer queries to relative error, over any portion of 
the received stream. If Gt is the summary after inserting the first t items Xt from stream X into G then, 
given any 1 < p < t, Gt can return a sample y G Xt so that \R{y,Xt) — p\ < st/8. Greenwald and Khanna 
guarantee in [3] that Gt uses 0( j log(£t)) words. We call this the GK guarantee. 

2.3 Our summary 

We combine Bernoulli sampling with the GK summary by downsampling the input data stream X to a 
sample stream S and then feeding S into a GK summary G. It looks like this: 


; 
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Figure 1: The big picture. 


The key reason this gives us a small summary is that we never need to store S', each time we sample 
an item into S we immediately feed it into G. Therefore, we only use as much space as G{S{Xt)) uses. In 
particular, as long as m = 0(poly(I/£)), we use only 0{- log i) words. 

To answer a query p for Xt we ask Gt the query pm/n and return the resulting sample y. There is a slight 
issue in that pm/n may be larger than |S'|; but if the approximation guarantee holds for the largest item in 
Xt then pm/n < (t+et/8)m/n, so using miD.{pm/n, |S'|} instead will not cause more than e/8 relative error 
in the approximation. 

The probability that our sample stream St is not too big (uses more than 2tm/n samples) is at least 
1 — exp(—m/192). If this happens to be the case then the probability that all of its samples y are good (have 
\R{y, St) — E(R{y, S'*))! < etm/8n) is at least 1 — 4mexp(—e^m/12288) by theorem 2.l|and the union bound. 


Choosing m > s^fgces to guarantee that both events occur with total probability at least 2/3. 

Further, if both St events occur then the total error introduced by both St and Gt is at most st/2. 
Suppose that Gt returns y when given pm/n. This means that \R{y,St) — pm/n\ < els'*] < s{2tm/n)/8 by 
the GK guarantee. Since both events for St occur, we also have \R{y,St) — ^R{y,Xt)\ < stm/^n (and only 
stm/8n in the case that we don’t truncate pm/n to [S'!). Thus, \^R{y,Xt) —pm/n\ < etm/2n. Equivalently, 
\Riy,Xt)-p\<et/2. 


2.4 Caveats 

There are two serious issues with this summary. The first is that it requires us to know the value of n in 
advance to perform the sampling. Also, as a byproduct of the sampling, we can only obtain approximation 
guarantees after we have seen at least 1/64 (or at least some constant fraction) of the items. This means 
that while the algorithm is sufficient for approximating order statistics over streams stored on disk, more is 
needed to get it to work for online streaming applications, in which (1) the stream size n is not known in 
advance, and (2) queries can be answered approximately at all times t < n and not just when t > n/64. 

Adapting the idea of our basic streaming summary to work online constitutes the next section and the bulk 
of our contribution. We start with a high-level overview of our online summary algorithm. In section [XT] we 
formally define an initial version of our algorithm whose expected size at any given time is 0(^ log j) words. 
In section 3.2 we show that our algorithm gurantees that yrNp, P{\R{y, X„) — p| < en) > 1 — exp(—1/e). In 
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3.3 


we discuss the slight modifications necessary to get a deterministic 0{^ log j) space complexity, 


section 

and also perform a time complexity analysis. 


3 An online summary 

Our algorithm works in rows, which are illustrated in appendix Row r is a summary of the first 2’'32m 
stream items. Since we don’t know how many items will actually be in the stream, we can’t start all of these 
rows running at the outset. Therefore, we start each row r > 1 once we have seen 1/64 of its total items. 
However, since we can’t save these items for every row we start, we need to construct an approximation of this 
fraction of the stream, which we do by using the summary of the previous row, and join this approximating 
stream with the new items that arrive while the row is live. We then wait until the row has seen a full half 
of its items before we permit it to start answering queries; this dilutes the influence of approximating the 
1 /64 of its input that we couldn’t store. 

Operation within a row is very much like the operation of our fixed-n streaming summary. We feed the 
joint approximate prefix + new item stream through a Bernoulli sampler to get a sample stream, which is 
then fed into a GK summary (which is stored). After row r has seen half of its items, its GK summary 
becomes the one used to answer quantile queries. When row r + 1 has seen 1/64 of its total items, row r 
generates an approximation of those items from its GK summary and feeds them as a stream into row r + 1. 

Row 0 is slightly different in order to bootstrap the algorithm. There is no join step since there is no 
previous row to join. Also, row 0 is active from the start. Lastly, we get rid of the sampling step so that we 
can answer queries over timesteps 1... m/2. 

After the first 32m items, row 0 is no longer needed, so we can clean up the space used by its GK 
summary. Similarly, after the first 2’’32m items, row r is no longer needed. The upshot of this is that we 
never need storage for more than six rows at a time. Since each GK summary uses 0{- log -) words, the six 
live GK summaries use only a constant factor more. 

Our error analysis, on the other hand, will require us to look back as many as H(log l/e) rows to ensure 
our approximation guarantee. We stress that we will not need to actually store these n(log 1/e) rows for our 
guarantee to hold; we will only need that they didn’t have any bad events (as will be defined) when they 
were alive. 

3.1 Algorithm description 

Our algorithm works in rows. Each row r has its own copy Gr of the GK algorithm that approximates its 
input to e/8 relative error. For each row r we define several streams: is the prefix stream of row r, is 

its suffix stream, Rr is its prefix stream replacement (generated by the previous row), is the joint stream 
Rr followed by Br, Sr is its sample stream, and Qr is a one-time stream generated from Gr by querying it 
with ranks pi... Ps/ei where pg = q{e/8){m/6A). 

The prefix stream Ar = for row r > 1, importantly, is not directly received by row r. Instead, 

at the end of timestep 2’’“^m, row r —1 generates Qr-i and duplicates each of those 8/e items T'~^era/8 
times to get the replacement prefix Rr, which is then immediately fed into row r before timestep 2’’“^m-|-l 
begins. 

Each row can be live or not and active or not. Row 0 is live in timesteps 1... 32 to and row r > 1 is live 
in timesteps 2’’“^m-|-l... 2''32 to. Live rows require space; once a row is no longer live we can free up the 
space it used. Row 0 is active in timesteps 1... 32 to and row r > 1 is active in timesteps 2’'16 to-|-1 ... 2’'32to. 
This definition means that exactly one row r{t) is active in any given timestep t. Any queries that are asked 
in timestep t are answered by Gr{t)- Given query p, we ask Gr(t) for p/T'^^'^82 and return the result. 

At each timestep t, when item xt arrives, it is fed as the next item in the suffix stream Br for each live 
row r. Br joined with Rr defines the joined input stream Jr- For r > 1, is downsampled to the sample 
stream Sr by sampling each item independently with probability l/2''32. For row 0, no downsampling is 
performed, so Sq = Jq. Lastly, Sr is fed into Gr- 


4 



Appendix shows the operation of and the communication between the first six rows. Solid arrows 
indicate continuous streams and dashed arrows indicate one-time messages. Appendix is a pseudocode 
listing of the algorithm. 


3.2 Error analysis 

Define Cr = x(2’'32m+l), x(2'^32m+2),... and Yr to be Rr followed by Br and then Cr- That is, Yr is just 
the continuation of Jr for the entire length of the input stream. 

Fix some time t. All of our claims will be relative to time t; that is, if we write Sr we mean Sr{t). Our 
error analysis proceeds as follows. We start by proving that R{y,Yr) is a good approximation of R{y,Yr-i) 
when certain conditions hold for Sr-i- By induction, this means that R{y,Yr) is a good approximation of 
R{y,X = Yo) when the conditions hold for all of 5*0 ... 5^-1, and actually it’s enough for the conditions to 
hold for just 5'r_iogi/E •. • Sr-i to get a good approximation. Having proven this claim, we then prove that 
the result y = y{p) of a query to our summary has R{y, X) close to p. Lastly, we show that m = 0(poly(l/e)) 
suffices to ensure that the conditions hold for S'r_iogi/e ... Sr-i with very high probability (1 — 

Lemma 3.1. Let ar be the event that |5'r| > 2m and let Pr he the event that any of the first < 2m samples 
z in Sr has |2’'32i?(z, 5^) — R{z,Yr)\ > st/8. Say that Sr is good if neither Ur nor Pr occur (or if r = 0). 

For all r > 1 such that t > tr = 2’'“^m, and for all items y, if Sr-i is good then \R{y, Yr) — R{y, Yr-i)\ < 
2^em. 


Proof At the end of time R we have Yr{tr) = Rr{tr), which is each item y{pq) in Qr-i duplicated etr/8 times. 
If Sr-iifr) is good then by theorem [2d] and the GK guarantee we have that \R{y{pq),Yr-i{tr)) —2"^ 32 pq\ < 

sir 12. 

Fix q so that y{pq) <y< y{pq+i)^ where 2/(po) and y{pij^/e) are defined to be minA^ and sup I? for com¬ 
pleteness. Fixing q this way implies that R{y, Yr{tr)) = 2'^~^32pq. By the above bound on R{y{pq), Yr-i(tr)) 
we also have that 2^~^32pq —etr/2 < R{y,Yr-i{tr)) < 2'^~^32pq+i +£tr/2. 

Putting these two bounds together, and recalling that pq = qem/312, we find that \R{y,Yr{tr)) — 
R{y, Yr-i(tr))\ < 2''sm. For each time t after tr, the new item Xt changes the rank of y in both streams Yr 
and Yr-i by the same additive offset, so \R{y, Yr)—R{y, Fr_i)| = \R{y, Yr{tr)) — R{y, Yr-i{tr))\ < 2'" sm. □ 


By applying this lemma inductively we can bound the difference between Yr and X = Yq: 

Corollary 3.2. For all r >1 such that t > tr = 2'"~^m, if all of S'o(ti), Si{t 2 ),..., Sr-i(tr) are good, then 
\Riy,Yr)-Riy,X)\<2-+^£m. 


To ensure that all of these Si are good would require m to grow with n, which would be bad. Happily, it 
is enough to require only the last log 2 1/e sample summaries to be good, since the other items we disregard 
constitute only a small fraction of the total stream. 


Corollary 3.3. Let d = log 2 1/e. For all r > 1 such that t > tr = 2^ ^m, if all of Sr-i{tr), ..., Sr-d{tr-d+i) 
are good, then \R(y,Yr) — R{y,X)\ < 2'^'^^sm. 


Proof. By lemma 3.1 we have \R{y,Yr) — R{y,Yr-d)\ 
except possibly the first = 2 ’'“^to/2^ = 2'~~ 


< 2’’+^eTO. At time t > F-d, 
^sm items. Thus 


Yr-d and X share all 


\R{y,Yr) - R{y,X)\ < \R[y,Yr) - i?(j/, W-d)| + \R{y,Yr-d) - R{y,X)\ 
< 2'^'^^sm + 2’'em 


□ 


We now prove that the if the last several sample streams were good then querying our summary will give 
us a good result. 

Lemma 3.4. Let d = log 2 ^ and r = r(f). If all Sr{t), Sr-i{tr), ..., Sr-d{tr-d+i) flee good, then querying 
our summary with rank p (= querying the active GK summary Gr with p/2^32) returns y = y{p) such that 
\Riy,x) - p\< St. 
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Proof. By corollary 3.3 


guarantee, \R(]j,Yr) — p 


we have \R{y,Yr) — R{y,X)\ < < et/2. By theorem 2.1 and the GK 

< et/2. □ 


Lastly, we prove that m = 0(poly(l/e)) suffices to ensure that all of S'r(t), S'r-i(tr), ■ • • , -S'r-d(G-d+i) 
are good with probability at least 1 — . 

Lemma 3.5. Let d = \og 2 ^/ e and r = r{t). Ifm> all of Sr(t), Sr-i(tr),.. ■, Sr-d(tr-d+i) 

are good with probability at least 1 — 


Proof. There are at most l+log 2 1/e < 41nl/e of these summary streams total. Theorem 2.1 and the union 
bound give us P(no occurs) < 41n j exp(— to/ 192) and P(no Pr occurs) < IGmln 1 exp(—e^TO/12288). 

Together, P = P(some Sr is not good) < 20TOln 1 exp(—e^TO/12288). It suffices to choose to > 
to obtain P < □ 


3.3 Space and time complexity 


A minor issue with the algorithm is that, as written in section 3.1 we do not actually have a bound on the 
worst-case space complexity of the algorithm; we only have a bound on the space needed at any given point 
in time. This issue is due to the fact that there are low probability events in which 15^1 can get arbitrarily 
large and the fact that over n items there are a total of n(logn) sample streams. The space complexity of 
the algorithm is 0(max |S'j.|), and to bound this value with constant probability using the Chernoff bound 
appears to require that max |5'r| = 11 (log log n), which is too big. 

Fortunately, fixing this problem is simple. Instead of feeding every sample of Sr into the GK summary 
Gr, we only feed each next sample if Gr has seen < 2m samples so far. That is, we deterministically restrict 
Gr to receiving only 2m samples. Lemmas |3.1| through |3.4| condition on the goodness of the sample streams 
Sr, which ensures that the Gr receive at most 2m samples each, and the claim of lemma [3A] is independent 
of the operation of Gr. Therefore, by restricting each Gr to receive at most 2m inputs we can ensure that 
the space complexity is deterministically 0{- log -) without breaking our error guarantees. 

From a practical perspective, the assumption in the streaming setting is that new items arrive over the 
input stream A at a high rate, so both the worst-case per-item processing time as well as the amortized time 
to process n items are important. For our per-item time complexity, the limiting factor is the duplication 
step that occurs at the end of each time R = 2^~^m, which makes the worst-case per-item processing time 
as large as fl{n). Instead, at time tr we could generate Qr-i and store it in 0(1/e) words, and then on each 
arrival t = 2’'“^to + 1 ... 2'^m we could insert both xt and also the next item in Rr. By the time tr+i = 2tr 
that we generate Qr, all items in Rr will have been inserted into Jr. Thus the worst-case per-item time 
complexity is where is the worst-case per-item time to query or insert into one of our GK 

summaries. Over 2’’32 to items there are at most 2m insertions into any one GK summary, so the amortized 
time over n items in either case is 0{ "^ logre/32m ^^ where Tqk is the amortized per-item time to query 
or insert into one of our GK summaries. 

The pseudocode listing in appendix includes the changes of this section. 


4 Discussion 

Our starting point is a very natural idea of Manku et. al. that due to subtle technical difficulties saw 
no further application to the quantiles problem for sixteen years. This key idea is to downsample the input 
stream and feed the resulting sample stream into a deterministic summary data structure (compare our 
figure [B with figure 1 on page 254 of 0)- At a very high level, we are simply replacing their deterministic 
0{1 lo^en) MRL summary with the deterministic 0(1 logen) GK summary [3]. However, as evidenced 
by the fact that fourteen years after the GK summary was published the state of the art was the random¬ 
ized 0(l\o^^‘^ 1) summary of Agarwal et. al. [T] [2], adapting this idea to the GK summary without 
superconstant overhead is nontrivial. 
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Our implementation of this idea is conceptually different from the implementation of Manku et. al. in 
two respects. First, we use the GK algorithm strictly as a black box, whereas Manku et. al. peek into the 
internals of their MRL algorithm, using its algorithm-specific interface (New, Collapse, Output) rather 
than the more generic interface (Insert, Query). At an equivalent level, dealing with the GK algorithm 
is already unpleasant. Using the generic interface, our implementation could just as easily replace the GK 
boxes in the diagram in appendix with MRL boxes; or, for the bounded universe model, with boxes 
running the q-digest summary of Shrivastava et. al. [5]. 

The second respect in which our algorithm differs critically from that of Manku et. al. is that we operate 
on streams rather than on stream items. We use this approach in our proof strategy too; the key step in 
our error analysis, lemma 3.1 is a statement about (what to us are) static objects, so we can trade out the 
complexity of dealing with time-varying data structures for a simple induction. 

The approach we developed to reduce a deterministic summary to a randomized summary was: 


1. For a fixed n, downsample the input stream, feed the resulting sample stream into the deterministic 
summary, and prove a probabilistic bound. 

2. Run an infinite number of copies of step 1, for exponentially growing values of n. 

3. Replace a constant fraction prefix of each copy with an approximation generated by the previous copy, 
and prove using step 1 that this approximation probably doesn’t cause too much error. 

4. Use step 3 inductively to prove a probabilistic bound for the entire stream. 

We believe (albeit on the basis of this problem and our algorithm alone) that developing streaming algorithms 
that operate on streams rather than on stream items is likely to be a useful design approach for many 
problems. 
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A Diagram for online algorithm 



Figure 2: Each row r has its own copy Gr of the GK algorithm that approximates its input to e/8 relative 
error. Ar is the prefix stream of row r, Br is its suffix stream, Rr is its prefix stream replacement (generated 
by the previous row), is the joint stream Rj. followed by B^, Sr is its sample stream, and Qr is a one-time 
stream generated from Gr at time 2'~m to get the replacement prefix Rr+i- 















































B Pseudocode for online algorithm 


The differences in the algorithms of sections |3.1| and |3.3| are marked. 


Initially, allocate space for Gq. Mark row 0 as live and active, 
for t = 1, 2,... do 

foreach live row r > 0 do 

with probability l/2'’32 do 
if section 


3.1 


then 


Insert Xt into G 


else if section 


3.3 then 

Insert Xt into GV if G^ has seen < 2m insertions. 

if r > 1 and 2’’ < t < 2’’m and Gr has seen < 2m insertions then 

with probability l/2’’32 do 

Also insert item t — 2'^~^m of Rr into Gr. 
if t = 2’'“^m for some r > 1 then 

Allocate s pace for Gr. Mark row r as live, 
if section 


3.1 


then 


Querywith pi... pg/e to get yi... ys/e- 
for q = 1... 8/e do 

for 1... 2'^~^emf8 do 

with probability l/2’'32 do 
Insert yq into G^. 
else if section 


3.3 then 


Store Qr-i, to implicitly define Rr. 
if t = 2’'16m for some r > 1 then 

Mark row r as active. Unmark row r —1 as active, 
if t = 2’'32m for some r > 0 then 

Unmark row r as live. Free space for Gr. 
on query p do 

Let r{t) be the active row. 

Query Gr{t) for rank p/2’'(‘^32. Return the result. 


Algorithm 1: Procedural listing of the algorithm. 
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